1,485 research outputs found

    The case for cloud computing in genome informatics

    Get PDF
    With DNA sequencing now getting cheaper more quickly than data storage, the time may have come to use cloud computing for genome informatics

    ISOWN: accurate somatic mutation identification in the absence of normal tissue controls.

    Get PDF
    BackgroundA key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenarios in which matched normal tissue is not available for comparison.ResultsIn this work, we describe an algorithm to distinguish somatic single nucleotide variants (SNVs) in next-generation sequencing data from germline polymorphisms in the absence of normal samples using a machine learning approach. Our algorithm was evaluated using a family of supervised learning classifications across six different cancer types and ~1600 samples, including cell lines, fresh frozen tissues, and formalin-fixed paraffin-embedded tissues; we tested our algorithm with both deep targeted and whole-exome sequencing data. Our algorithm correctly classified between 95 and 98% of somatic mutations with F1-measure ranges from 75.9 to 98.6% depending on the tumor type. We have released the algorithm as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues).ConclusionsIn this work, we describe the development, implementation, and validation of ISOWN, an accurate algorithm for predicting somatic mutations in cancer tissues in the absence of matching normal tissues. ISOWN is available as Open Source under Apache License 2.0 from https://github.com/ikalatskaya/ISOWN

    A cancer cell-line titration series for evaluating somatic classification.

    Get PDF
    BackgroundAccurate detection of somatic single nucleotide variants and small insertions and deletions from DNA sequencing experiments of tumour-normal pairs is a challenging task. Tumour samples are often contaminated with normal cells confounding the available evidence for the somatic variants. Furthermore, tumours are heterogeneous so sub-clonal variants are observed at reduced allele frequencies. We present here a cell-line titration series dataset that can be used to evaluate somatic variant calling pipelines with the goal of reliably calling true somatic mutations at low allele frequencies.ResultsCell-line DNA was mixed with matched normal DNA at 8 different ratios to generate samples with known tumour cellularities, and exome sequenced on Illumina HiSeq to depths of >300×. The data was processed with several different variant calling pipelines and verification experiments were performed to assay >1500 somatic variant candidates using Ion Torrent PGM as an orthogonal technology. By examining the variants called at varying cellularities and depths of coverage, we show that the best performing pipelines are able to maintain a high level of precision at any cellularity. In addition, we estimate the number of true somatic variants undetected as cellularity and coverage decrease.ConclusionsOur cell-line titration series dataset, along with the associated verification results, was effective for this evaluation and will serve as a valuable dataset for future somatic calling algorithm development. The data is available for further analysis at the European Genome-phenome Archive under accession number EGAS00001001016. Data access requires registration through the International Cancer Genome Consortium's Data Access Compliance Office (ICGC DACO)

    Gallus GBrowse: a unified genomic database for the chicken

    Get PDF
    Gallus GBrowse (http://birdbase.net/cgi-bin/gbrowse/gallus/) provides online access to genomic and other information about the chicken, Gallus gallus. The information provided by this resource includes predicted genes and Gene Ontology (GO) terms, links to Gallus In Situ Hybridization Analysis (GEISHA), Unigene and Reactome, the genomic positions of chicken genetic markers, SNPs and microarray probes, and mappings from turkey, condor and zebra finch DNA and EST sequences to the chicken genome. We also provide a BLAT server (http://birdbase.net/cgi-bin/webBlat) for matching user-provided sequences to the chicken genome. These tools make the Gallus GBrowse server a valuable resource for researchers seeking genomic information regarding the chicken and other avian species

    GMODWeb: a web framework for the generic model organism database

    Get PDF
    ABSTRACT: The Generic Model Organism Database (GMOD) initiative provides species-agnostic data models and software tools for representing curated model organism data. Here we describe GMODWeb, a GMOD project designed to speed the development of Model Organism Database (MOD) websites. Sites created with GMODWeb provide integration with other GMOD tools and allow users to browse and search through a variety of data types. GMODWeb was built using the open source Turnkey web framework and is available from http://turnkey.sourceforge.net

    Reactome - a knowledgebase of human biological pathways

    Get PDF
    Pathway curation is a powerful tool for systematically associating gene products with functions. Reactome (www.reactome.org) is a manually curated human pathway knowledgebase describing a wide range of biological processes in a computationally accessible manner. The core unit of the Reactome data model is the Reaction, whose instances form a network of biological interactions through entities that are consumed, produced, or act as catalysts. Entities are distinguished by their molecular identities and cellular locations. Set objects allow grouping of related entities. Curation is based on communication between expert authors and staff curators, facilitated by freely available data entry tools. Manually curated data are subjected to quality control and peer review by a second expert. Reactome data are released quarterly. At release time, electronic orthology inference performed on human data produces reaction predictions in 22 species ranging from mouse to bacteria. Cross-references to a large number of publicly available databases are attached, providing multiple entry points into the database. The Reactome Mart allows query submission and data retrieval from Reactome and across other databases. The SkyPainter tool provides visualization and statistical analysis of user supplied data, e.g. from microarray experiments. Reactome data are freely available in a number of data formats (e.g. BioPax, SBML)

    Exome sequencing identifies nonsegregating nonsense ATM and PALB2 variants in familial pancreatic cancer.

    Get PDF
    We sequenced 11 germline exomes from five families with familial pancreatic cancer (FPC). One proband had a germline nonsense variant in ATM with somatic loss of the variant allele. Another proband had a nonsense variant in PALB2 with somatic loss of the variant allele. Both variants were absent in a relative with FPC. These findings question the causal mechanisms of ATM and PALB2 in these families and highlight challenges in identifying the causes of familial cancer syndromes using exome sequencing

    Profiling Pre-Replication Complex Mutations in Cancer

    Get PDF
    The pre-replication complex (preRC) consists of 15 proteins that mark DNA replication initiation sites and regulate replication timing. Deficiency in preRC proteins results in genomic instability (re-replication) and developmental defects (Meier-Gorlin syndrome). Our aim was to assess the scope of preRC gene aberrations in cancer. Variations in preRC genes were studied using CBio Portal software and TCGA PanCancer dataset. The functional impact of detected variants was evaluated in silico by three different prediction tools: SIFT (sequence and evolutionary conservation - based), PolyPhen2 (protein sequence and structure – based) and MutPred2 (supervised learning method based on neural networks). No mutational hotspots were observed in any of the 15 preRC genes and no mutual exclusivity between mutations in preRC genes were detected. The highest alteration incidence in preRC genes was found in endometrial carcinoma and melanoma. The majority of the variations seen in preRC genes were non-synonymous. The functional assessment has shown that 253/1215 (21%) preRC gene mutations were predicted to be pathogenic with high confidence by 2/3 computational algorithms. None of the variants reached the high confidence pathogenicity score by all 3 prediction tool. In contrast, 49% of variants were predicted to be either benign by all three tools or benign by 2/3 or 1/3 tools, with the remaining 1/3 or 2/3, respectively, classifying them as low confidence pathogenic. These finding suggest that mutations in preRC proteins might be passenger mutations and that cancer cells can tolerate them. The future step is to see whether incidence of coding vs. noncoding preRC mutations correlates with Tumor Mutation Burden (TMB) and Genome Instability Index (GII) of cancer.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202

    WormBook: the online review of Caenorhabditis elegans biology

    Get PDF
    WormBook () is an open-access, online collection of original, peer-reviewed chapters on the biology of Caenorhabditis elegans and related nematodes. Since WormBook was launched in June 2005 with 12 chapters, it has grown to over 100 chapters, covering nearly every aspect of C.elegans research, from Cell Biology and Neurobiology to Evolution and Ecology. WormBook also serves as the text companion to WormBase, the C.elegans model organism database. Objects such as genes, proteins and cells are linked to the relevant pages in WormBase, providing easily accessible background information. Additionally, WormBook chapters contain links to other relevant topics in WormBook, and the in-text citations are linked to their abstracts in PubMed and full-text references, if available. Since WormBook is online, its chapters are able to contain movies and complex images that would not be possible in a print version. WormBook is designed to keep up with the rapid pace of discovery in the field of C.elegans research and continues to grow. WormBook represents a generic publishing infrastructure that is easily adaptable to other research communities to facilitate the dissemination of knowledge in the field
    corecore